Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
¹è±ë ±â¹ÝÀÇ ºÎÆ®½ºÆ®·¡ÇÎÀ» ÀÌ¿ëÇÑ °³Ã¼¸í ÀÎ½Ä ÇнÀ ±â¹ý |
¿µ¹®Á¦¸ñ(English Title) |
A Named-Entity Recognition Training Method Using Bagging-Based Bootstrapping |
ÀúÀÚ(Author) |
Á¤À¯Áø
±èÁÖ¾Ö
°í¿µÁß
¼Á¤¿¬
Yujin Jeong
Juae Kim
Youngjoong Ko
Jungyun Seo
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 45 NO. 08 PP. 0825 ~ 0830 (2018. 08) |
Çѱ۳»¿ë (Korean Abstract) |
±âÁ¸ °³Ã¼¸í ÀÎ½Ä ¿¬±¸´Â ÁöµµÇнÀ¿¡ ±â¹ÝÇÑ °³Ã¼¸íÀνÄÀÌ ÁÖ¸¦ ÀÌ·ç°í ÀÖ´Ù. ÁöµµÇнÀ¿¡ ±â¹ÝÇÑ °³Ã¼¸íÀνÄÀÌ ÁÁÀº ¼º´ÉÀ» º¸ÀÌ°í ÀÖÁö¸¸, ´ë·®ÀÇ Á¤´ä ¸»¹¶Ä¡¸¦ ±¸ÃàÇϱâ À§ÇØ ¸¹Àº ½Ã°£°ú ºñ¿ëÀ» ÇÊ¿ä·Î ÇÑ´Ù´Â ¹®Á¦Á¡ÀÌ ÀÖ´Ù. º» ³í¹®¿¡¼´Â ÀÌ·¯ÇÑ ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇØ, ´ë·®ÀÇ ¸»¹¶Ä¡¿¡ ¼öµ¿À¸·Î Á¤´äÀ» ºÎ¿©Çϱâ À§ÇÑ ³ë·Â ¾øÀÌ, °³Ã¼¸í ÀÎ½Ä ¸ðµ¨ÀÌ ÀÚµ¿ »ý¼ºÇÑ Á¤´äÀ» ÇнÀ¿¡ »ç¿ëÇÏ´Â °³Ã¼¸í ÀÎ½Ä ¸ðµ¨ ÇнÀ ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. Á¦¾È ¹æ¹ýÀº ¼Ò·®ÀÇ °³Ã¼¸í Á¤´ä ¸»¹¶Ä¡¸¸À¸·Î ´ë·®ÀÇ °³Ã¼¸í Á¤´äÀ» ÀÚµ¿ »ý¼ºÇÏ¿© ÇнÀ¿¡ »ç¿ëÇϹǷÎ, ´ë·®ÀÇ Á¤´ä ¸»¹¶Ä¡¸¦ »ý¼ºÇϱâ À§ÇØ ÇÊ¿äÇÑ ½Ã°£°ú ºñ¿ëÀ» Å©°Ô Àý°¨½ÃŲ´Ù. Ãß°¡ÀûÀ¸·Î ¹è±ë ±â¹ýÀ» »ç¿ëÇÏ¿© ÀÚµ¿ »ý¼ºÇÑ Á¤´äµé Áß ¿À·ù¸¦ Á¦°ÅÇÑ´Ù. ºÎÆ®½ºÆ®·¡ÇÎ ±â¹ý°ú ¹è±ë ±â¹ýÀ» Ãß°¡ÇÏ¿´À»¶§, F1 Á¡¼ö ÃÖ°í 70.67%¸¦ ±â·ÏÇÏ¿´´Ù. ºñ±³¸¦ À§ÇÑ ±âº» CRF °³Ã¼¸í ÀÎ½Ä ¸ðµ¨ÀÇ F1Á¡¼ö´Â 65.59%¸¦ ±â·ÏÇÏ¿´´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Most previous named-entity(NE recognition studies have been based on supervised learning methods. Although supervised learning-based NE recognition has performed well, it requires a lot of time and cost to construct a large labeled corpus. In this paper, we propose an NE recognition training method that uses an automatically generated labeled corpus to solve this problem. Since the proposed method uses a large machine-labeled corpus, it can greatly reduce the time and cost needed to generate a labeled corpus manually. In addition, a bagging-based bootstrapping technique is applied to our method in order to correct errors from the machine-labeled data. As a result, experimental results show that the proposed method achieves the highest F1 score of 70.76% by adding the bagging-based bootstrapping technique, which is 5.17%p higher than that of the baseline system.
|
Å°¿öµå(Keyword) |
°³Ã¼¸í ÀνÄ
ºÎÆ®½ºÆ®·¡ÇÎ
¹è±ë
CRF
ÁØÁöµµÇнÀ
¸»¹¶Ä¡ »ý¼º
Named-entity recognition
bootstrapping
bagging
CRF
semi-supervised learning
corpus generation
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|